Search CORE

2 research outputs found

Catégorisation automatique de textes et cooccurrence de mots provenant de documents non étiquetés

Author: Réhel Simon
Publication venue: Bibliotheque de l' Universite Laval
Publication date: 01/01/2005
Field of study

Ayant pour objectif de rendre un programme informatique capable d’assigner de façon autonome des documents textuels à leur classe d’appartenance, la catégorisation automatique de textes est rendue possible grâce à l’apprentissage supervisé. Un entraînement du programme est effectué sur un ensemble de documents auxquels des étiquettes de catégorie ont déjà été assignées par des humains. Or, la constitution de cet ensemble d’entraînement se révèle un processus long et coûteux. Ce mémoire propose une façon d’améliorer la capacité d’un classificateur à bien accomplir sa tâche dans des situations où un entraînement sur un nombre suffisant de textes n’aura pas été possible. L’approche suggérée consiste à étudier une forme d’association, la cooccurrence, entre les mots provenant d’un ensemble de textes libellés et ceux provenant d’un ensemble de textes non libellés, plus volumineux. On espère ainsi augmenter à faible coût le vocabulaire utile à la classification de textes, en minimisant le nombre de documents à étiqueter.Automated text categorization consists of developing computer programs able to autonomously assign texts to predefined categories, on the basis of their content. Such applications are possible thanks to supervised learning, which implies a training phase on manually labeled documents. However, the construction of a training set is long and expensive. This study suggests a way to assist text classifiers in the gathering of the vocabulary when the size of the training set is limited. So, it is proposed to analyze word cooccurrence inside a text collection of many non-labeled documents, to augment the vocabulary produced by the analysis of the labeled texts. The representation of new documents to classify can then be modified in order to better match the vocabulary used by the classifier. What is expected, of course, is an improvement of its ability to categorize texts

CorpusUL

Antifouling properties of amphiphilic poly(3-hydroxyalkanoate): an environmentally-friendly coating

Author: Balnois E.
Brelle L.
Faÿ F.
Guennec A.
Langlois V.
Linossier I
Renard E.
Simon-Colin C.
Vallée-Réhel K.
Publication venue: 'Informa UK Limited'
Publication date: 14/09/2021
Field of study

International audienc

HAL-Université de Bretagne Occidentale

HAL-INSU

HAL - UPEC / UPEM